Instrumental Variables

Our first steps with instrumental variables in R

Welcome

Introduction!

Welcome to our fifth tutorial for the Statistics II: Statistical Modeling & Causal Inference (with R) course. During this week’s lecture you reviewed what happens when experiments break due to non-compliance. You were also introduced to encouragement experimental setups, and their the observational analogue, instrumental variables. Finally, you learned how you can estimate local average treatment effects by breaking out the variation in D into two parts.

In this lab session we will:

  • Review how to manually extract the LATE through the Wald estimator
  • Learn how to perform Two-stage Least Squares regression (2SLS) with iv_robust() from the estimatr package
  • Illustrate the mechanics of Two-stage Least Squares regression (2SLS) with lm()

Packages

# These are the libraries we will use today. Make sure to install them in your console in case you have not done so previously.

library(tidyverse) # To load the collection of packages in the tidyverse
library(dplyr) # To wrangle our data (this package is also loaded by library(tidyverse))
library(readr) # To load the .csv data (this package is also loaded by library(tidyverse))
library(ggplot2) # To create plots (this package is also loaded by library(tidyverse))
library(tidyr) # To use pivot_wider() (this package is also loaded by library(tidyverse))
library(broom) # To format regression output
library(modelsummary) # To format model output
library(knitr) # To create HTML tables with kable()
library(kableExtra) # To format the HTML tables

library(janitor) # To examine our data with tabyl()
library(estimatr) # To run 2SLS with iv_robust()

Measuring the effect of mosquito net use on malaria infections

Imagine that the organization you work for is laying out a project to distribute mosquito nets to help combat malaria transmissions.

The funding agency requires a impact evaluation report from your organization. You are in charge of running the evaluation of this program.

You realize that there are potential unobserved confounders that could bias the observed differences in malaria risk for mosquito net users and non-users. You also think about the ethical considerations of fully randomizing who receives the nets, so you remember your Statistics II lecture on IVs and set up an encouragement design.

Your beneficiaries are scattered across ten villages. You decide to randomly select five villages to send SMS reminders every night encouraging them to use the mosquito nets. (This example is using simulated data)


Assumptions

To render credible results for the evaluation of this program, we need to fulfill a certain set of assumptions:

a) Relevance: Also known as non-zero average encouragement effect. Does our \(Z\) create variation in our \(D\)? In other words, is the mosquito net use different under the encouragement group? (Statistically testable)

b) Exogeneity/Ignorability of the instrument: Potential outcomes and treatments are independent of \(Z\). In this case given by out randomization of encouragement by villages.

c) Exclusion restriction: The instrument only affects the outcome via the treatment. In other words, there are no alternative paths through which our SMS can have an effect on malaria infections other that the use of the mosquito nets.

d) Monotonicity: No defiers. We assume that non-compliers fall in the camp of always- and never-takers. We would not expect subjects who when encouraged would not use the nets, but would use them if they did not receive the SMS reminder.


Exploring our data

evaluation_df <- readr::read_csv("https://raw.githubusercontent.com/seramirezruiz/stats-ii-lab/master/Session%205/data/evaluation_data.csv") # loading simulated data frame of the intervention

You receive the results of your intervention from the M&E officers. There are 1000 inhabitants across the ten villages. This is what the data look like:

  • village_name: A character string with the name of the village
  • sms: A binary marker for the SMS encouragement
  • net_use: A binary marker for mosquito net use
  • malaria: A binary marker for malaria infection

Compliance types

You may remember, this table from the lecture:


We can cross-tabulate our data with janitor::tabyl() and the additional features of the janitor::adorn_* functions.

Why janitor::tabyl()? Because as prospective policy analysts we will do a lot of counting.

As the vignette of the package even puts it:

Analysts do a lot of counting. Indeed, it’s been said that ‘data science is mostly counting things.’ But the base R function for counting, table(), leaves much to be desired:

  • It doesn’t accept data.frame inputs (and thus doesn’t play nicely with the %>% pipe)
  • It doesn’t output data.frames
  • Its results are hard to format. Compare the look and formatting choices of an R table to a Microsoft Excel PivotTable.

This is how it works. Say we are interested in exploring the number of persons in each of the observed strata, we would do:

evaluation_df %>% # your data frame
  janitor::tabyl(net_use, sms) %>% # the two dimensions for the table (D, Z)
  janitor::adorn_totals(c("row", "col")) %>% # add totals for rows and cols
  knitr::kable() %>% # turn into a kable table for nice rendering in HTML
  kableExtra::kable_styling() %>% 
  kableExtra::add_header_above(c("", "sms" = 2, "")) #add header for sms
sms
net_use 0 1 Total
0 335 110 445
1 165 390 555
Total 500 500 1000

If you want to learn more about the syntax of tabyl(), make sure to check the vignette


Exercise

Let’s explore the compliance types from this table

  • Where are our compliers and non-compliers?
  • How many people were encouraged via SMS, but did not use the net?
  • How many people were not encouraged via SMS, yet they utilized the net?

Average malaria infections across strata

We can utilize the tabyl() syntax and our knowledge from the grammar of graphics to table and visualize the distribution of malaria on each stratum:

Table: Count of malaria infections across (Y,Z)
evaluation_df %>% # your data frame
  janitor::tabyl(malaria, sms) %>% # the two dimensions for the table (Y, Z)
  janitor::adorn_totals(c("row", "col")) %>% # add totals for rows and cols
  knitr::kable() %>% # turn into a kable table for nice rendering in HTML
  kableExtra::kable_styling() %>% 
  kableExtra::add_header_above(c("", "sms" = 2, "")) #add header for sms
sms
malaria 0 1 Total
0 248 410 658
1 252 90 342
Total 500 500 1000

Plot: Distribution of malaria infections across strata
ggplot(evaluation_df, aes(x = factor(sms), 
                          y = factor(net_use), 
                          color = factor(malaria))) +
  geom_jitter() +
  theme_minimal() +
  scale_x_discrete(labels = c("SMS = 0", "SMS = 1")) +
  scale_y_discrete(limits = c("1","0"), labels = c("NET = 1", "NET = 0")) +
  scale_color_manual(values = c("#CDCDCD", "#CC0055"),
                     labels = c("Not infected", "Infected")) +
  labs(x = "Encouragement",
       y = "Treatment",
       color = "") +
  theme(legend.position = "bottom")


Exercise

  • What insights can we gather from the table and plot?

Exploring our set-up

Let’s check whether SMS encouragement is a strong instrument

In other words, we are looking at the relevance assumption. Does our SMS encouragement create changes in our mosquito net use?

lm(net_use ~ sms, data = evaluation_df) %>%
  modelsummary::modelsummary(stars = c('*' = .1, '**' = .05, '***' = .01),
                             statistic = 'conf.int',
                             fmt = 2)
Model 1
(Intercept) 0.33***
[0.29, 0.37]
sms 0.45***
[0.39, 0.51]
Num.Obs. 1000
R2 0.205
R2 Adj. 0.204
AIC 1216.0
BIC 1230.7
Log.Lik. −605.010
F 257.315
RMSE 0.44
* p

Economists have established as a “rule-of-thumb” for the case of a single endogenous regressor to be considered a strong instrument should have a F-statistic 1 greater than 10. From this regression, we can see that SMS encouragement is a strong instrument.

Additionally, the substantive read in this case is that only 33% of those who did not receive the SMS utilized the mosquito nets, whereas 78% of those who got the SMS encouragement did.


Let’s gather a naïve estimate of mosquito net use and malaria infection.

Why do you think we call this a naive estimate?

naive_model <- lm(malaria ~ net_use, data = evaluation_df)

modelsummary::modelsummary(naive_model,
                           stars = c('*' = .1, '**' = .05, '***' = .01),
                           statistic = 'conf.int',
                           fmt = 2)
Model 1
(Intercept) 0.68***
[0.65, 0.72]
net_use −0.61***
[−0.66, −0.57]
Num.Obs. 1000
R2 0.415
R2 Adj. 0.414
AIC 816.8
BIC 831.5
Log.Lik. −405.407
F 707.002
RMSE 0.36
* p

Exercise

  • What would our expectations be under the naive model?




Let’s gather our intent-to-treat effect (ITT)

This is the effect that our SMS encouragement had on malaria infections. \[ITT = E(Malaria_i|SMS=1) - E(Malaria_i|SMS=0)\]

itt_model <- lm(malaria ~ sms, data = evaluation_df)
modelsummary::modelsummary(itt_model,
                           stars = c('*' = .1, '**' = .05, '***' = .01),
                           statistic = 'conf.int',
                           fmt = 2)
Model 1
(Intercept) 0.50***
[0.46, 0.54]
sms −0.32***
[−0.38, −0.27]
Num.Obs. 1000
R2 0.117
R2 Adj. 0.116
AIC 1228.4
BIC 1243.1
Log.Lik. −611.190
F 131.753
RMSE 0.45
* p

Exercise

  • What does this tell us?




Let’s gather out local average treatment effect (LATE)

We have several options for this:

  • Wald Estimator We can calculate this by hand. x. We can also calculate the average malaria rates among those who did and did not receive an SMS (no SMS = 0.504, yes SMS = 0.18).
Table: Observations across strata (D,Z)
sms
net_use 0 1 Total
0 335 110 445
1 165 390 555
Total 500 500 1000

Table: Count of malaria infections across strata (Y,Z)
sms
malaria 0 1 Total
0 248 410 658
1 252 90 342
Total 500 500 1000

Local Average Treatment Effect (LATE) manually


\[LATE = \frac{cov(Y_i,Z_i)}{cov(D_i,Z_i)}\]

Let’s take a look at our numerator \(cov(Y_i,Z_i)\), also \(ITT\)

\(E(Y|Z = 1) = \frac{90}{(410+90)} = 0.18\)
\(E(Y|Z = 0) = \frac{252}{(252+248)} = 0.504\)

Let’s take a look at our denominator \(cov(D_i,Z_i)\)

\(E(D∣Z = 1) = \frac{390}{(390 + 110)} = 0.78\)
\(E(D∣Z = 0) = \frac{165}{(165 + 335)} = 0.33\)

We can calculate our LATE

\[LATE = \frac{(0.18 - 0.504)}{(0.78 - 0.33)} = -0.72\]


  • Two-stage Least Squares (2SLS). We will learn how to do this with iv_robust(), which is part of the estimatr package. It fits instrumental-variable regression through two-stage least squares. The syntax is the following:
estimatr::iv_robust(outcome ~ treatment | instrument, data)

In our case:

late_model <- estimatr::iv_robust(malaria ~ net_use | sms, data = evaluation_df)

modelsummary::modelsummary(late_model,
                           stars = c('*' = .1, '**' = .05, '***' = .01),
                           statistic = 'conf.int',
                           fmt = 2)
Model 1
(Intercept) 0.74***
[0.68, 0.80]
net_use −0.72***
[−0.82, −0.62]
Num.Obs. 1000
R2 0.402
R2 Adj. 0.402
AIC 837.4
BIC 852.1
RMSE 0.37
* p

Exercise

  • What is the substantive reading of these results?
  • What would you tell the funding partner in your evaluation report?




Mechanics behind two-stage least squares (2SLS)

What estimatr::iv_robust() is doing in the background is the following:

net_use_hat <- lm(net_use ~ sms, evaluation_df)$fitted.values # get fitted values from first stage (the part of x that is exogenously driven by z)
summary(lm(evaluation_df$malaria ~ net_use_hat)) # run second stage with instrumented x (careful, the standard errors are wrong; better use iv_robust() from estimatr instead)
## 
## Call:
## lm(formula = evaluation_df$malaria ~ net_use_hat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.504 -0.180 -0.180  0.496  0.820 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.74160    0.03757   19.74   <2e-16 ***
## net_use_hat -0.72000    0.06273  -11.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4463 on 998 degrees of freedom
## Multiple R-squared:  0.1166, Adjusted R-squared:  0.1157 
## F-statistic: 131.8 on 1 and 998 DF,  p-value: < 2.2e-16




Thinking about the validity of instruments

We can also adapt the iv_robust() syntax to accommodate valid conditional instruments:

estimatr::iv_robust(Y ~ D + W | Z + W, data = your_df)

Acknowledgements

This script was drafted by Lisa Oswald and Sebastian Ramirez-Ruiz, with contributions by Adelaida Barrera Daza, Carolina Díaz Suárez, Sofía García-Durrer, and William Fernandez.


  1. An F statistic is a value you get when you run an ANOVA test or a regression analysis to find out if the means between two populations are significantly different. It’s similar to a t-statistic from a t-test; A t-test will tell you if a single variable is statistically significant and an F test will tell you if a group of variables are jointly significant.↩︎